Education Apps: Market Trends, Monetization, and Growth Opportunities¶
Introduction¶
The education app market has experienced rapid growth in recent years, driven by increased mobile device adoption, digital learning trends, and demand for accessible educational content. This analysis leverages a dataset of over 2 million apps to uncover key trends, revenue strategies, and growth opportunities within the education category.
Objectives of this analysis:
- Identify the distribution of education apps by type (free vs paid) and monetization strategy (ads, in-app purchases, freemium models).
- Explore trends in app downloads and user engagement to determine which strategies correlate with higher reach.
- Provide actionable insights for companies and app developers to optimize revenue, improve user acquisition, and prioritize app development focus areas.
Dataset Overview:
- Size: 2,000,000+ apps
- Features: app category, pricing model, average installs, ratings, revenue indicators, and more
- Scope: Analysis focuses specifically on apps within the Education category
Key Value for Stakeholders:
By analyzing market patterns and monetization strategies, companies can make informed decisions about app development, marketing, and pricing, targeting segments with the highest growth and revenue potential.
Packages & Setup¶
We’ll use these packages for data cleaning, analysis, and visualization.
# Kindly upload the packages before starting :)
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
Data Import¶
import os
def load_dataset(file_path):
"""Load CSV dataset with error handling."""
if not os.path.exists(file_path):
raise FileNotFoundError(f"The file {file_path} was not found. Please check the path.")
try:
df = pd.read_csv(file_path)
print(f"Dataset loaded successfully: {df.shape[0]} rows, {df.shape[1]} columns")
return df
except Exception as e:
raise Exception(f"Error while loading dataset: {e}")
file_path = r"C:\Users\A\Desktop\playstore_app_market_insights\dataset\Google-Playstore.csv"
df = load_dataset(file_path)
Dataset loaded successfully: 2312944 rows, 24 columns
Dataset contains a comprehensive set of app features useful for revenue and user behavior analysis.
Data Cleaning & Transformation¶
def clean_dataset(df):
# 1. Handle Missing Values
df = df.dropna(subset=['App Name'])
df['Rating'] = df.groupby('Category')['Rating'].transform(lambda x: x.fillna(x.median()))
df['Released_missing'] = df['Released'].isna().astype(int)
df['Released'] = df['Released'].fillna(df['Last Updated'])
df['Developer Id'] = df['Developer Id'].fillna("N/A")
df['max_inst_miss'] = df['Minimum Installs'].isna().astype(int)
df['Minimum Installs'] = df['Minimum Installs'].fillna(df['Maximum Installs'])
df['Currency'] = df['Currency'].fillna("N/A")
# 2. Drop Useless Columns
df = df.drop([
'Developer Website', 'Developer Email', 'Privacy Policy', 'Scraped Time',
'App Id', 'Installs', 'Rating Count', 'Minimum Android'
], axis=1, errors='ignore')
# 3. Normalize Size
df["Size"] = df["Size"].astype(str).str.replace(",", "").str.replace(" ", "")
def convert_size(value):
try:
val = str(value).strip()
if val.lower() in {"varieswithdevice", "na", "n/a", ""}:
return np.nan
if val[-1].lower() == "m":
return float(val[:-1]) * 1000
elif val[-1].lower() == "k":
return float(val[:-1])
else:
return float(val)
except:
return np.nan
df["size"] = df["Size"].apply(convert_size)
df = df.drop(['Size'], axis=1, errors='ignore')
# 4. Convert Boolean to Int
df['Free'] = df['Free'].astype(int)
df['Ad Supported'] = df['Ad Supported'].astype(int)
df['In App Purchases'] = df['In App Purchases'].astype(int)
df['Editors Choice'] = df['Editors Choice'].astype(int)
# 5. Derived Columns
df['avg_installs'] = ((df['Minimum Installs'] + df['Maximum Installs']) / 2).round(0)
df['Released'] = pd.to_datetime(df['Released'], errors='coerce')
df['released_year'] = df['Released'].dt.year
# 6. Rename Columns (snake_case)
df = df.rename(columns={
"App Name": "app_name",
"Category": "category",
"Rating": "rating",
"Free": "app_status",
"Currency": "currency",
"Developer Id": "developer_name",
"Released": "released_date",
"Last Updated": "last_update",
"Content Rating": "content_target",
"Ad Supported": "ads_flag",
"In App Purchases": "in_app_purchases_flag",
"Editors Choice": "play_store_recommend"
})
# Ensure consistency between Price and app_status
df.loc[df['Price'] > 0, 'app_status'] = 0 # Paid
df.loc[df['Price'] == 0, 'app_status'] = 1 # Free
# 7. Remove Duplicates
df = df.drop_duplicates(['app_name'], keep='first')
print(f"Cleaning complete: {df.shape[0]} rows, {df.shape[1]} columns remain.")
return df
df = clean_dataset(df)
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['Rating'] = df.groupby('Category')['Rating'].transform(lambda x: x.fillna(x.median()))
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:6: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['Released_missing'] = df['Released'].isna().astype(int)
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['Released'] = df['Released'].fillna(df['Last Updated'])
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['Developer Id'] = df['Developer Id'].fillna("N/A")
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['max_inst_miss'] = df['Minimum Installs'].isna().astype(int)
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['Minimum Installs'] = df['Minimum Installs'].fillna(df['Maximum Installs'])
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:11: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['Currency'] = df['Currency'].fillna("N/A")
Cleaning complete: 2177943 rows, 20 columns remain.
Data Validation Checks¶
Before running the analysis, we validate critical columns to ensure data integrity:
- Ratings must be between 0 and 5.
- Year formats must be valid.
- All required columns must exist in the dataset.
# List of required columns
required_cols = [
"app_name", "category", "rating", "released_date",
"last_update", "app_status", "ads_flag",
"in_app_purchases_flag", "play_store_recommend", "avg_installs"
]
# 1. Check for missing required columns
missing_cols = [col for col in required_cols if col not in df.columns]
if missing_cols:
print(" Missing columns:", missing_cols)
else:
print(" All required columns are present.")
# 2. Validate rating values (should be between 0 and 5)
invalid_ratings = df[~df['rating'].between(0, 5, inclusive="both")]
print(f" Ratings valid: {len(invalid_ratings) == 0}")
if len(invalid_ratings) > 0:
print(" Invalid ratings found:", invalid_ratings['rating'].unique())
# 3. Validate year formats for released_date
df['released_year'] = df['released_date'].dt.year
invalid_years = df[df['released_year'].isna()]
print(f" Year formats valid: {len(invalid_years) == 0}")
if len(invalid_years) > 0:
print(" Invalid years detected in released_date.")
All required columns are present. Ratings valid: True Year formats valid: True
Exploratory Data Analysis (EDA)¶
Google Play Apps Overview¶
In this section, we perform an initial exploration of the Google Play dataset to understand the overall app market, including:
- Total number of apps and category distribution.
- Pricing and monetization strategies (Free vs Paid, Ads/IAP)
- Ratings and user engagement.
- Installs and popularity metrics.
These insights will help identify market trends and guide strategic recommendations before focusing on a deep dive into Education apps.
General summary and counts
# 1. General summary
def analyze_total_apps(df):
total_apps = len(df)
print(f"The Total number of apps: {total_apps}")
return total_apps
# 2. Category analysis
def analyze_categories(df, column = 'category'):
counts = df[column].value_counts().head(10).reset_index()
counts.columns = [column, 'Count']
fig = px.bar(
counts,
x= column,
y='Count',
title=f"Top 10 Categories by Number of Apps",
text='Count'
)
fig.update_traces(textposition='outside')
fig.show()
analyze_total_apps(df)
The Total number of apps: 2177943
2177943
# Category distribution
analyze_categories(df, column = 'category')
The dataset contains 2,177,943 apps, showing a very large and diverse market on Google Play.
The most populated categories are Education, Music & Audio, Business, Tools, Entertainment, and Lifestyle, indicating that these segments dominate the app ecosystem.
Education being the top category suggests strong user demand for learning and skill-building apps, while Music & Audio and Business show that entertainment and productivity remain major focuses for users.
Pricing and Monetization
# 3. Free vs Paid distribution
def plot_free_apps_financing(df):
# Classify free apps
free_apps = df[df['Price'] == 0].copy()
free_apps['financing_type'] = 'Nothing'
free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
free_apps.loc[(free_apps['ads_flag'] == 0) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
# Count and percentage
counts = free_apps['financing_type'].value_counts()
percent = (counts / counts.sum() * 100).round(2)
# Plot pie chart
fig = px.pie(
names=counts.index,
values=counts.values,
title='Distribution of Free Apps by Financing Type',
hole=0.3 # donut chart style
)
fig.show()
# Financing Trends
def plot_all_apps_trend(df):
"""
Plots the trend of financing methods among all apps (Free and Paid) over the years.
For paid apps, the financing type is considered 'Paid' since they are directly monetized.
"""
apps = df.copy()
# Ensure 'released_year' is numeric
apps['released_year'] = pd.to_numeric(apps['released_year'], errors='coerce')
# Classify financing type
apps['financing_type'] = 'Nothing'
# For Free apps
free_mask = apps['Price'] == 0
apps.loc[free_mask & (apps['ads_flag'] == 1) & (apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
apps.loc[free_mask & (apps['ads_flag'] == 1) & (apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
apps.loc[free_mask & (apps['ads_flag'] == 0) & (apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
# For Paid apps
apps.loc[apps['Price'] > 0, 'financing_type'] = 'Paid'
# Group by year and financing type
yearly_counts = apps.groupby(['released_year', 'financing_type']).size().reset_index(name='count')
# Calculate percentage per year
yearly_counts['percentage'] = yearly_counts.groupby('released_year')['count'].transform(lambda x: x / x.sum() * 100)
# Plot trend
fig = px.line(yearly_counts, x='released_year', y='percentage', color='financing_type',
markers=True,
title='Trend of Financing Methods Among All Apps Over Years',
labels={'percentage': 'Percentage of Apps', 'released_year': 'Year'})
fig.show()
# Free vs Paid
plot_free_apps_financing(df)
Almost half of free apps (48.4%) fall under "Other," meaning they do not use Ads or In-App Purchases (IAP) as a monetization strategy. This may indicate apps that are entirely free with no direct revenue model, possibly relying on external funding or promotional purposes.
42.97% of free apps monetize using Ads only, which is the most common revenue strategy among monetized free apps.
Only 6.27% of apps use both Ads + IAP, suggesting that dual monetization is relatively rare but potentially more effective for revenue maximization.
2.33% of apps rely solely on IAP, indicating this is the least common approach for free apps.
plot_all_apps_trend(df)
Ratings & user engagement
# 4. Rating distribution
def analyze_rating_distribution(df):
df_filtered = df[df['rating'] > 0 ]
fig = px.histogram(df_filtered, x="rating", nbins=20, title="Ratings Distribution")
fig.show()
analyze_rating_distribution(df)
Installs & popularity metrics
def get_category_installs(df):
category_installs = df.groupby('category')['avg_installs'].mean().sort_values(ascending=False).head()
category_installs = category_installs.round(0).reset_index()
top3 = df.sort_values(by='avg_installs', ascending=False).head(3)
fig1 = px.bar(category_installs, x='category', y='avg_installs',
title='Top Categories by Average Installs',
labels={'avg_installs': 'Average Installs', 'category': 'Category'},
text='avg_installs')
fig1.show()
# Top 3 apps
top3 = df.sort_values(by='avg_installs', ascending=False).head(3)
fig2 = px.bar(top3, x='app_name', y='avg_installs', color='category',
title='Top 3 Apps by Installs',
text='avg_installs')
fig2.show()
get_category_installs(df)
Deep Dive: Education Apps¶
This section focuses on Education apps, with the goal of providing insights and recommendations directly relevant for XpertBot's Education app strategy.
Objectives:
- Understand the Education app market, user engagement, and monetization trends.
- Identify high-performing apps and successful strategies.
- Provide actionable recommendations to improve XpertBot's app downloads, ratings, and revenue.
General summary & key metrics
def education_count(df):
edu_app = df[df['category'].isin(['Educational','Education']) ]
edu_count = edu_app.shape[0]
avg_installs = edu_app['avg_installs'].mean().round(0)
subset_nonzero = edu_app[edu_app['rating'] != 0]
avg_rating_edu = subset_nonzero['rating'].mean().round(2) # no need to groupby, only Education category
print("The number of education apps is:", edu_count)
print("\nThe average installs of education apps is:", avg_installs)
print("\nThe average rating of the education apps is:", avg_rating_edu)
education_count(df)
The number of education apps is: 248212 The average installs of education apps is: 56330.0 The average rating of the education apps is: 4.18
Education apps are numerous (~228k) with solid user engagement (avg. 33.8k installs) and a strong satisfaction level (avg. rating 4.19), indicating both high demand and generally positive user experience.
Pricing & Monetization
def education_free_paid_stats(df):
# Filter Education apps
edu_app = df[df['category'].isin(['Educational','Education']) ]
# Count of Free and Paid apps
free_paid_count = edu_app['app_status'].map({1: 'Free', 0: 'Paid'}).value_counts()
# Percentage of Free and Paid apps
free_paid_percentage = edu_app['app_status'].map({1: 'Free', 0: 'Paid'}).value_counts(normalize=True) * 100
# Return values if you want to reuse them
return free_paid_count, free_paid_percentage
def plot_education_free_paid(free_paid_count, free_paid_percentage):
# Prepare data
df_plot = free_paid_count.reset_index()
df_plot.columns = ['Status', 'Count']
df_plot['Percentage'] = free_paid_percentage.values.round(2)
# Bar chart
fig = px.bar(df_plot, x='Status', y='Count', text='Percentage',
title='Free vs Paid Education Apps',
labels={'Count':'Number of Apps', 'Status':'App Status'})
fig.update_traces(texttemplate='%{text}%', textposition='outside')
fig.show()
def avg_paid_education_price(df):
# Filter Education apps
education_app = df[df['category'].isin(['Educational','Education']) ]
# Filter only paid apps
paid_education_apps = education_app[education_app['Price'] > 0]
# Compute average price
avg_price = paid_education_apps['Price'].mean().round(3)
# Print result
print(f'The average price of paid Education apps is ${avg_price}')
# Return value for reuse
return avg_price
# Compute stats first
free_paid_count, free_paid_percentage = education_free_paid_stats(df)
plot_education_free_paid(free_paid_count, free_paid_percentage)
avg_price = avg_paid_education_price(df)
The average price of paid Education apps is $5.721
Paid Education apps are moderately priced on average (~$5.72), suggesting a low-cost barrier that aligns with accessibility and mass adoption strategies.
Installs & Revenue Metrics
# Define function
def revenue_summary(df):
"""Calculate total revenue for education apps vs all apps."""
# Paid apps only
paid_apps = df[(df['Price'] > 0) & (df['avg_installs'] > 0)].copy()
paid_apps['revenue'] = paid_apps['Price'] * paid_apps['avg_installs']
# Education revenue
edu_revenue = paid_apps[paid_apps['category'].isin(['Educational','Education'])]['revenue'].sum()
total_revenue = paid_apps['revenue'].sum()
edu_share = (edu_revenue / total_revenue) * 100
print("Total Estimated Revenue Across All Paid Apps: ${:,.0f}".format(total_revenue))
print("Education Apps Revenue: ${:,.0f}".format(edu_revenue))
print("Education Share of Total Paid Revenue: {:.2f}%".format(edu_share))
# Create dataframe of paid education apps
edu_paid = df[(df['category'] == 'Education') & (df['Price'] > 0) & (df['avg_installs'] > 0)].copy()
edu_paid['revenue'] = edu_paid['Price'] * edu_paid['avg_installs']
return edu_revenue, total_revenue, edu_share, edu_paid
# Run function first to get edu_paid
edu_revenue, total_revenue, edu_share, edu_paid = revenue_summary(df)
# Format revenue with commas and round
top5_revenue_apps = edu_paid.sort_values(by="revenue", ascending=False).head(5).copy()
top5_revenue_apps['revenue'] = top5_revenue_apps['revenue'].apply(lambda x: f"${x:,.0f}")
top5_revenue_apps['avg_installs'] = top5_revenue_apps['avg_installs'].apply(lambda x: f"{x:,.0f}")
# Show as table
from IPython.display import display
display(top5_revenue_apps[['app_name', 'Price', 'avg_installs', 'revenue']])
Total Estimated Revenue Across All Paid Apps: $2,071,291,277 Education Apps Revenue: $116,620,919 Education Share of Total Paid Revenue: 5.63%
| app_name | Price | avg_installs | revenue | |
|---|---|---|---|---|
| 1610632 | Driving Theory Test 4 in 1 Kit + Hazard Percep... | 5.490000 | 659,096 | $3,618,437 |
| 903078 | Toca Lab: Elements | 3.990000 | 646,984 | $2,581,466 |
| 260116 | Toca Life: City | 3.990000 | 533,906 | $2,130,285 |
| 809596 | Driving school theory - Fahrlehrer24 | 20.015573 | 104,906 | $2,099,754 |
| 861489 | Official DVSA Theory Test Kit | 5.490000 | 298,568 | $1,639,138 |
Developers Analysis
def top_education_developers(df, top_n=10):
# Filter Education apps
education_app = df[df['category'] == 'Education']
# Count apps per developer
dev_by_app = education_app['developer_name'].value_counts().head(top_n)
df_plot = dev_by_app.reset_index()
df_plot.columns = ['Developer', 'Number of Apps']
fig = px.bar(df_plot, x='Number of Apps', y='Developer', orientation='h',
title='Top 10 Education App Developers',
text='Number of Apps')
fig.update_layout(yaxis={'categoryorder':'total ascending'}) # largest on top
fig.show()
top_education_developers(df)
Financing models analysis
def iap_stats(df):
# Overall apps with/without IAP
app_with_iap = df['in_app_purchases_flag'].value_counts()
p_app_with_iap = df['in_app_purchases_flag'].value_counts(normalize=True) * 100
# Education apps
education_app = df[df['category'].isin(['Educational','Education']) ]
edu_with_iap = education_app['in_app_purchases_flag'].value_counts()
edu_with_iap_percentage = education_app['in_app_purchases_flag'].value_counts(normalize=True) * 100
df_plot1 = edu_with_iap.reset_index()
df_plot1.columns = ['IAP', 'Count']
df_plot1['IAP'] = df_plot1['IAP'].map({0:'No IAP', 1:'Has IAP'})
fig1 = px.pie(df_plot1, names='IAP', values='Count',
title='Education Apps: With vs Without IAP')
fig1.show()
def ads_distribution_education(df):
edu_apps = df[df['category'].isin(['Educational','Education']) ]
# Count ads vs no ads
ads_counts = edu_apps['ads_flag'].map({1: "With Ads", 0: "No Ads"}).value_counts().reset_index()
ads_counts.columns = ["Ads Status", "Count"]
# Percentage
ads_counts["Percentage"] = (ads_counts["Count"] / ads_counts["Count"].sum()) * 100
# Plot interactive pie chart
fig = px.pie(
ads_counts,
names="Ads Status",
values="Count",
title="Ads Distribution in Education Apps",
hole=0.3
)
fig.show()
def plot_free_apps_financing(df):
#filter education apps
education_app = df[df['category'].isin(['Education','Educational'])]
# Classify free apps
free_apps = education_app[education_app['Price'] == 0].copy()
free_apps['financing_type'] = 'Nothing'
free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
free_apps.loc[(free_apps['ads_flag'] == 0) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
# Count and percentage
counts = free_apps['financing_type'].value_counts()
percent = (counts / counts.sum() * 100).round(2)
paid_apps = education_app[education_app['Price'] == 1].copy()
paid_apps['financing_type'] = 'Paid'
paid_apps.loc[(paid_apps['ads_flag'] == 1) & (paid_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
paid_apps.loc[(paid_apps['ads_flag'] == 1) & (paid_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
paid_apps.loc[(paid_apps['ads_flag'] == 0) & (paid_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
# Count and percentage
paid_counts = paid_apps['financing_type'].value_counts()
paid_percent = (paid_counts / paid_counts.sum() * 100).round(2)
# Plot pie chart - free apps
fig_1 = px.pie(
names=counts.index,
values=counts.values,
title='Distribution of Free Education Apps by Financing Type',
hole=0.3 # donut chart style
)
fig_1.show()
# paid apps
fig_2 = px.pie(
names=paid_counts.index,
values=paid_counts.values,
title='Distribution of Paid Education Apps by Financing Type',
hole=0.3 # donut chart style
)
fig_2.show()
iap_stats(df)
Only 7.6% of Education apps use in-app purchases (IAP). The vast majority (92.4%) rely on other revenue models or none at all.
ads_distribution_education(df)
Ads are a common but not dominant monetization strategy. Too many ads could hurt user experience.
plot_free_apps_financing(df)
The free segment is split: a majority simply offer free access (possibly funded externally), while others lean towards ads. Very few free apps use IAP, which suggests that monetizing educational content directly is less common.
Users who pay for Education apps expect an ad-free premium experience. Mixing ads into paid apps is rare and potentially risky.
Recommendations¶
The dominant and most accepted model in Education apps is Free (with optional ads or IAP).
Launching as a free app will align with user expectations and maximize reach.
Ads can be used moderately, but IAP (premium features, certificates, or advanced content) could be Xpertbot’s main monetization path, since apps combining Free + Ads + IAP tend to capture both installs and revenue streams (as we’ll confirm the final analysis).
Final Comparative Analysis¶
To better understand how Education apps position themselves in the wider Play Store ecosystem, we compare different financing strategies, adoption levels, and user ratings. This section evaluates the performance of free vs. paid models, the effectiveness of ads and in-app purchases, and the strategies used by top-performing apps. The goal is to highlight which approaches drive the highest installs and ratings, and to draw practical lessons for Xpertbot’s own education app.
Education apps & financing strategies
def top10_educational_apps(df):
"""Show top 10 education apps by installs and their financing strategy"""
# Filter education apps
edu_apps = df[df['category'].isin(['Educational','Education']) ]
# Sort by installs
top10_edu = edu_apps.sort_values(by="avg_installs", ascending=False).head(10)
# Define financing strategy
def financing(row):
if row['app_status'] == 0: # Paid app
return "Paid App"
elif row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
return "Ads + IAP"
elif row['ads_flag'] == 1:
return "Ads Only"
elif row['in_app_purchases_flag'] == 1:
return "IAP Only"
else:
return "Free (No Revenue)"
# Apply strategy
top10_edu["financing_strategy"] = top10_edu.apply(financing, axis=1)
# Select relevant columns
top10_edu = top10_edu[["app_name", "avg_installs", "financing_strategy", "category","rating"]]
return top10_edu
def top_paid_edu_apps(df, n=10):
# Filter only education + paid apps
paid_edu = df[(df['category'].isin(['Education','Educational'])) & (df['app_status'] == 0)].copy() # 0 = Paid
# Define financing strategy
def financing(row):
if row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
return "Paid + Ads + IAP"
elif row['ads_flag'] == 1:
return "Paid + Ads"
elif row['in_app_purchases_flag'] == 1:
return "Paid + IAP "
else:
return "Paid only"
paid_edu['financing_strategy'] = paid_edu.apply(financing, axis=1)
# Sort by installs and select top N
top_paid_edu = paid_edu.sort_values(by="avg_installs", ascending=False).head(n)
return top_paid_edu[['app_name', 'avg_installs', 'Price', 'financing_strategy']]
top10_educational_apps(df)
| app_name | avg_installs | financing_strategy | category | rating | |
|---|---|---|---|---|---|
| 1050190 | Duolingo: Learn Languages Free | 180715565.0 | Ads + IAP | Education | 4.6 |
| 122353 | Google Classroom | 156008640.0 | Free (No Revenue) | Education | 2.6 |
| 1682537 | Samsung Global Goals | 146340788.0 | Ads Only | Education | 4.5 |
| 336867 | Toca Kitchen 2 | 126884584.0 | Ads Only | Educational | 4.2 |
| 695655 | Photomath | 123678395.0 | IAP Only | Education | 4.7 |
| 178286 | U-Dictionary: Oxford Dictionary Free Now Trans... | 112914354.0 | Ads + IAP | Education | 4.5 |
| 2144949 | Masha and the Bear. Educational Games | 111844136.0 | Ads + IAP | Educational | 4.1 |
| 388316 | Truck games for kids - build a house, car wash | 107404399.0 | Ads + IAP | Educational | 4.1 |
| 1204946 | Brainly – Home Learning & Homework Help | 105322199.0 | Free (No Revenue) | Education | 4.3 |
| 725710 | Baby Panda's Supermarket | 104487545.0 | Ads + IAP | Educational | 4.3 |
Top-performing apps use a hybrid monetization strategy, not just ads or just IAP.
top_paid_edu_apps(df)
| app_name | avg_installs | Price | financing_strategy | |
|---|---|---|---|---|
| 366246 | Peppa Pig: Theme Park | 1661773.0 | 2.99 | Paid only |
| 1304647 | Peppa Pig: Sports Day | 1314754.0 | 2.99 | Paid only |
| 1610632 | Driving Theory Test 4 in 1 Kit + Hazard Percep... | 659096.0 | 5.49 | Paid + IAP |
| 1422516 | Teach Your Monster to Read: Phonics & Reading ... | 650534.0 | 4.99 | Paid only |
| 903078 | Toca Lab: Elements | 646984.0 | 3.99 | Paid + Ads |
| 1284447 | Calc Fast | 554006.0 | 0.99 | Paid only |
| 263851 | My Town : School | 544722.0 | 2.99 | Paid + IAP |
| 260116 | Toca Life: City | 533906.0 | 3.99 | Paid + Ads |
| 392819 | Speed Math 2018 - Pro | 533785.0 | 0.99 | Paid + Ads |
| 831818 | My City : After School | 517785.0 | 2.99 | Paid only |
Paid apps can still succeed, but the market ceiling is much lower than free apps. Paid is more niche (parents buying games for kids, test prep apps, etc.).
def free_vs_paid_performance(df):
"""Compare installs, ratings, and financing strategies between free and paid education apps."""
edu_apps = df[df['category'] == 'Education'].copy()
# Exclude invalid ratings
edu_apps = edu_apps[edu_apps['rating'] > 0]
# Split datasets
free_apps = edu_apps[edu_apps['app_status'] == 1] # Free
paid_apps = edu_apps[edu_apps['app_status'] == 0] # Paid
# --- Summary stats ---
summary = pd.DataFrame({
"Avg Installs": [free_apps['avg_installs'].mean(), paid_apps['avg_installs'].mean()],
"Median Installs": [free_apps['avg_installs'].median(), paid_apps['avg_installs'].median()],
"Avg Rating": [free_apps['rating'].mean(), paid_apps['rating'].mean()],
"Median Rating": [free_apps['rating'].median(), paid_apps['rating'].median()],
"App Count": [len(free_apps), len(paid_apps)]
}, index=["Free", "Paid"])
# --- Financing strategies ---
def financing_breakdown(subset):
no_financing = ((subset['ads_flag'] == 0) & (subset['in_app_purchases_flag'] == 0)).sum()
ads_only = ((subset['ads_flag'] == 1) & (subset['in_app_purchases_flag'] == 0)).sum()
iap_only = ((subset['ads_flag'] == 0) & (subset['in_app_purchases_flag'] == 1)).sum()
both = ((subset['ads_flag'] == 1) & (subset['in_app_purchases_flag'] == 1)).sum()
return pd.Series({
"No Financing": no_financing,
"Ads only": ads_only,
"In-App Purchases only": iap_only,
"Ads & IAP": both
})
financing = pd.DataFrame({
"Free": financing_breakdown(free_apps),
"Paid": financing_breakdown(paid_apps)
}).T
# --- Visualization: installs ---
fig1 = px.box(
edu_apps,
x="app_status",
y="avg_installs",
title="Distribution of Installs: Free vs Paid Education Apps",
labels={"app_status":"App Type (1=Free, 0=Paid)", "avg_installs":"Average Installs"},
log_y=True
)
fig1.show()
free_vs_paid_performance(df)
Ratings are about the same (both ~4.2)
Users overwhelmingly prefer free apps. Paid apps are not better rated, so they don’t have a quality edge — they just limit adoption.
Financing strategy effectiveness
def financing_strategy_effectiveness(df):
edu_apps = df[df['category'] == 'Education'].copy()
edu_apps = edu_apps[edu_apps['rating'] > 0] # drop invalid ratings
# Define monetization type
def get_strategy(row):
if row['app_status'] == 0: # Paid
return "Paid"
elif row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
return "Free + Ads + IAP"
elif row['ads_flag'] == 1:
return "Free + Ads"
elif row['in_app_purchases_flag'] == 1:
return "Free + IAP"
else:
return "Free Only"
edu_apps['monetization'] = edu_apps.apply(get_strategy, axis=1)
# Summary stats
summary = edu_apps.groupby('monetization').agg(
Avg_Installs=('avg_installs', 'mean'),
Median_Installs=('avg_installs', 'median'),
Avg_Rating=('rating', 'mean'),
Median_Rating=('rating', 'median'),
App_Count=('app_name', 'count')
).sort_values(by='Avg_Installs', ascending=False)
# Visualization: Installs
fig1 = px.bar(
summary.reset_index(),
x="monetization", y="Avg_Installs",
color="monetization",
title="Average Installs by Financing Strategy (Education Apps)",
log_y=True,
labels={"Avg_Installs": "Average Installs (log scale)"}
)
fig1.show()
# Visualization: Ratings
fig2 = px.bar(
summary.reset_index(),
x="monetization", y="Avg_Rating",
color="monetization",
title="Average Ratings by Financing Strategy (Education Apps)",
labels={"Avg_Rating": "Average Rating (0–5)"}
)
fig2.show()
return summary
# Run
financing_summary = financing_strategy_effectiveness(df)
The most effective strategies are Free + IAP and Free + Ads + IAP. Ads-only apps underperform, while pure Free or Paid models miss out on monetization potential.
Recommendation for Xpertbot¶
Go Free at Launch: To gain traction, Xpertbot should launch its app for free.
Adopt a Freemium Model (Free + IAP, optionally Ads):
Offer core features for free, but lock advanced features, certifications, or premium content behind in-app purchases.
Ads can be included in the free version but must be limited to avoid hurting ratings.
Avoid Paid-only strategy: It drastically reduces adoption with no rating advantage.
Position Against Competitors: Apps like Duolingo and Photomath prove that Free + Ads + IAP is scalable, sustainable, and well-accepted by users.
*Best Strategy for Xpertbot:*
Adopt a Free + IAP (with optional ads) monetization model. Focus on strong user experience to secure high ratings, while gradually monetizing through advanced features or premium tiers.
Limitations & Next Steps¶
This analysis offers strong insights into education apps on the Play Store but has some limitations. The dataset is a snapshot in time and may not reflect the newest apps or removals. Some fields, such as installs, were reported in ranges and averaged, which can distort results, especially for very large apps. Missing values required imputation, and revenue was inferred from financing strategies rather than actual earnings, so results should be seen as indicative rather than exact.
For next steps, the analysis could be deepened by segmenting education apps into subcategories (e.g., language learning, test prep, kids’ games) and tracking trends over time. Benchmarking top competitors would reveal best practices, while analyzing user reviews could highlight needs and pain points. Building an interactive dashboard would give Xpertbot decision-makers a dynamic view of the market, and once the app is live, A/B testing different monetization models would confirm which strategies work best in practice.
In summary, the education app market is both promising and competitive. Free apps dominate adoption, while hybrid models (Free + IAP + Ads) drive the strongest performance. For Xpertbot, success will depend on offering a high-quality free app with thoughtful monetization through in-app purchases and, where appropriate, ads. Looking forward, continuous monitoring and A/B testing will help refine this strategy, ensuring sustainable growth and user satisfaction in the evolving education market.